chore(spanner): use channel affinity#13231
Conversation
| } | ||
|
|
||
| public static TracingFramework getActiveTracingFramework() { | ||
| synchronized (lock) { |
There was a problem hiding this comment.
FYI: this change is unrelated, but since it was small perf change I included it here, this was on critical request path being shown up in benchmark mutex profile
There was a problem hiding this comment.
Code Review
This pull request replaces the use of logical affinity keys and explicit unbinding with direct channel ID affinity using AtomicReference for Spanner operations. This change simplifies the integration with grpc-gcp by removing the need for affinity-key map management and the associated cleanup logic. Additionally, SpannerOptions was updated to use a volatile variable for the active tracing framework, removing unnecessary synchronization. I have no feedback to provide.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a ChannelAffinityRef mechanism to GcpManagedChannel to facilitate sticky channel routing without the need for an internal affinity-key map. This new approach is integrated into the Spanner client, replacing the previous CHANNEL_HINT and UNBIND_CHANNEL_HINT logic, and allows for more direct channel management during retries. The changes also include the removal of the GrpcGcpAffinityUtil class, the deprecation of explicit affinity cleanup methods in SpannerRpc, and updates to SpannerOptions for more efficient tracing framework access. Regarding the feedback, a potential NullPointerException was identified in GcpManagedChannel.newCall when no channels are available; it is recommended to handle this case by returning a NoopGcpClientCall.
There was a problem hiding this comment.
Code Review
This pull request replaces the CHANNEL_HINT and UNBIND_CHANNEL_HINT mechanism with a new ChannelAffinityRef system for managing sticky channel routing in gRPC-GCP. This change eliminates the need for gRPC-GCP to maintain internal affinity-key maps for Spanner operations. Additionally, SpannerOptions was optimized by replacing synchronized access to the tracing framework with a volatile field. Review feedback identified a potential logic error in the channel selection loop where the 'use different channel' flag could be lost during retries, and recommended adding activity checks when picking fallback channels.
1f07a11 to
36329d8
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a new ChannelAffinityRef mechanism in GcpManagedChannel to replace the previous hint-based affinity system, simplifying channel lifecycle management for Cloud Spanner. The changes remove the need for explicit affinity unbinding and the GrpcGcpAffinityUtil class. Additionally, SpannerOptions was updated to use a volatile field for the tracing framework to improve concurrency. A critical review comment points out that getChannelRefByAffinityRef could return null, potentially causing a NullPointerException in newCall, and suggests throwing an exception instead to maintain invariants.
| */ | ||
| protected ChannelRef getChannelRefByAffinityRef(ChannelAffinityRef affinityRef) { | ||
| maybeDynamicUpscale(); | ||
| while (true) { |
There was a problem hiding this comment.
nit: can we add a clarifying comment for why we are looping here?
| } | ||
| ByteString id = getTransactionId(); | ||
| if (id != null && !id.isEmpty()) { | ||
| rpc.clearTransactionAndChannelAffinity(id, Option.CHANNEL_HINT.getLong(channelHint)); |
There was a problem hiding this comment.
This not only cleaned up channel affinity, but also transaction affinity for location-aware routing, right? I think that we still need the latter.
There was a problem hiding this comment.
Yes, it's intended.
For Omni, irrespective of whether it's a strong/stale single/multi-use read-only, each RPC can land to the server based on the routing hint, so pinning to the same endpoint for read-only transactions is not needed.
Only read-write transactions need pinning to the same endpoint, and since there is only one gRPC channel per endpoint, there is no need to maintain affinities, so we can safely remove it.
| public void clearTransactionAndChannelAffinity( | ||
| ByteString transactionId, @Nullable Long channelHint) { | ||
| if (keyAwareChannel != null) { | ||
| keyAwareChannel.clearTransactionAndChannelAffinity(transactionId, channelHint); |
There was a problem hiding this comment.
Are you sure that we do not need this call anymore? (See also my other comment on AbstractReadContext)
|
|
||
| void clearTransactionAndChannelAffinity(ByteString transactionId, @Nullable Long channelHint) { | ||
| String address = transactionAffinities.asMap().remove(transactionId); | ||
| readOnlyTxPreferLeader.invalidate(transactionId); |
There was a problem hiding this comment.
Is this also no longer needed? (I mean specifically the readOnlyTxPreferLeader.invalidate call)
| public static final CallOptions.Key<Integer> CHANNEL_ID_KEY = | ||
| CallOptions.Key.create("GcpChannelId"); | ||
|
|
||
| /** CallOptions key for sticky channel routing without affinity-key map state. */ |
There was a problem hiding this comment.
Overall, there are quite big changes to grpc-gcp being made in this pull request, but there are no tests that verify these changes. Can we add tests that cover the changes that we make to grpc-gcp here? Relying on tests in the Spanner client is not enough, as this is a standalone library that can be used by other clients.
There was a problem hiding this comment.
The test coverage for the changes to the Spanner client is also quite thin, but the existing tests generally do cover these changes. One interesting test (if possible) would be a test that really verifies that all requests in a single read/write or multi-use read-only transaction really all use the same gRPC channel (so basically checking the local port where the requests are coming from on the mock server).
There was a problem hiding this comment.
Added some more tests


Internal reference: go/grpc-gcp-fixes#bookmark=id.q4x3oa8l672